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^ Abstract 

t . We investigate the classes of functions -whose minimization diagrams can be approximated 

Q^ efficiently in IR''. We present a general framework and a data-structure that can be used to 

■^r approximate the minimization diagram of such functions. The resulting data-structure has near 

^ linear size and can ans-wer queries in logarithmic time. Applications include approximating the 

Voronoi diagram of (additively or multiplicatively) -weighted points. Our technique also -works 

1—^ for more general distance functions, such as metrics induced by convex bodies, and the nearest 

\^ furthest-neighbor distance to a set of point sets. Interestingly, our frame-work -works also for 

rj distance functions that do not comply -with the triangle inequality. For many of these functions 

jyT no near-linear size approximation -was kno-wn before. 



1. Introduction 



Given a set of functions T = < fi :TR — )-]R i = I, . . . ,n>, their minimization diagram is the 
^2J function /min(q) = min /j(q), for any q € H . By viewing the graphs of these functions as 

C _ ^ i=l,...,n 

•^ manifolds in ]R "*" , the graph of the minimization diagram, also known as the lower envelope 

^^ of J-", is the manifold that can be viewed from an observer at — oo on the x^^i axis. Given a set 

I of functions J- as above, many problems in Computational Geometry can be viewed as computing 

;^ the minimization diagram; that is, one preprocesses J-', and given a query point q, one needs to 

• ^H compute /min(q) quickly. This typically requires n '''^' space if one is interested in logarithmic query 

rS time. If one is restricted to using linear space, then the query time deteriorates to 0[n^^'~^^^''^'^ 

C^ [Mat92, ChalO]. There is substantial work on bounding the complexity of the lower envelope in 

various cases, how to compute it efficiently, and performing range search on them; see the book by 

Sharir and Agarwal [SA95]. 

Nearest neighbor. One natural problem that falls into this framework is the nearest neighbor 
(NN) search problem. Here, given a set P of n data points in a metric space X, we need to 
preprocess P, such that given a query point q£ X, one can find (quickly) the point Hq G P closest 
to q. Nearest neighbor search is a fundamental task used in numerous domains including machine 
learning, clustering, document retrieval, databases, statistics, and many others. 
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To see the connection to lower envelopes, consider a set of data points P = {pi, . . . , p„} in M . 
Next, consider the set of functions T = {/i, . . . , /„}, where /j(q) = ||q — Pi||, for i = 1, . . . , n. The 
graph of fi is the set of points {(q, /j(q)) | q G H } (which is a cone in ]R "'"^ with apex at (p,, 0)). 
Clearly the N N problem is to evaluate the minimization diagram of the functions at a query point 

q- 

More generally, given a set of n functions, one can think of the minimization diagram defining 
a "distance function", by analogy with the above. The distance of a query point here is simply the 
"height" of the lower envelope at that point. 

Exact nearest neighbor. The exact nearest neighbor problem has a naive linear time algorithm 
without any preprocessing. However, by doing some nontrivial preprocessing, one can achieve a 
sub-linear query time. In IR , this is facilitated by answering point location queries using a Voronoi 
diagram [dBCvKOOS]. However, this approach is only suitable for low dimensions, as the complexity 
of the Voronoi diagram is (^n''^''^') in the worst case. Specifically, Clarkson [Cla88] showed a 
data-structure with query time O(logn) time, and 0(n' "'"*") space, where 5 > is a prespecified 
constant (the O(-) notation here hides constants that are exponential in the dimension). One can 
trade-off the space used and the query time [AM93] . Meiser [Mei93] provided a data-structure with 
query time O (d^ log n) (which has polynomial dependency on the dimension) , where the space used 
is 0(^n'^^ Y These solutions are impractical even for data-sets of moderate size if the dimension is 
larger than two. 

Approximate nearest neighbor. In typical applications, however, it is usually sufficient to 
return an approximate nearest neighbor (ANN). Given an e > 0, a (1 + e)-ANN, to a query 
point q, is a point y £ P, such that 



|q-y|| < (i + e) llq- n 



i\ 



where nq G P is the nearest neighbor to q in P. Considerable amount of work was done on this 
problem, see [Cla06] and references therein. 

In high dimensional Euclidean space, Indyk and Motwani showed that ANN can be reduced to 
a small number of near neighbor queries [IM98, HIM12]. Next, using locality sensitive hashing they 
provide a data-structure that answers ANN queries in time (roughly) 0(n^' '^^^') and preprocessing 
time and space Oin^'^^'^^'^'^'y, here the O(-) hides terms polynomial in logn and 1/e. This was 
improved to Oin^'^^^^' I query time, and preprocessing time and space Oin^^^'^^^^' I [AI08]. 

These bounds are near optimal [MNP06]. 

In low dimensions (i.e., IR for small d), one can use linear space (independent of e) and get ANN 
query time 0(logn + 1/e ) [AMN"'"98, Harll]. The trade-off for this logarithmic query time is 
of course an exponential dependence on d. Interestingly, for this data-structure, the approximation 
parameter e is not prespecified during the construction; one needs to provide it only during the 
query. An alternative approach, is to use Approximate Voronoi Diagrams (AVD), introduced by Har- 
Peled [HarOl], which is a partition of space into regions, of near-linear total complexity, typically 
with a representative point for each region that is an ANN for any point in the region. In particular, 
Har-Peled showed that there is such a decomposition of size O ( (n/e ) log n) , such that ANN queries 
can be answered in O(logn) time. Arya and Malamatos [AM02] showed how to build AVD's of linear 
complexity (i.e., 0{n/e'^)). Their construction uses Well-Separated Pair Decomposition [CK95]. 
Further trade-offs between query time and space for AVD's were studied by Arya et al. [AMM09]. 



Generalized distance functions: motivation. The algorithms for approximate nearest neigh- 
bor, extend to various metrics in IR , for example the well known Ip metrics. In particular, previous 
constructions of AVD's extend to ip metrics [HarOl, AM02] as well. However, these constructions 
fail even for a relatively simple and natural extension; specifically, multiplicative weighted Voronoi 
diagrams. Here, every site p, in the given point set P, has a weight Wp, and the "distance" of a query 
point q to p is /p(q) = ujp ||q — p||. The function /p is the natural distance function induced by p. 
As with ordinary Voronoi diagrams, one can define the weighted Voronoi diagram as a partition 
of space into disjoint regions, one for each site p, such that in the region for p the function /p is 
the one realizing the minimum among all the functions induced by the points of P. It is known 
that, even in the plane, multiplicative Voronoi diagrams can have quadratic complexity, and the 
minimizing distance function usually does not comply with the triangle inequality. Intuitively, such 
multiplicative Voronoi diagrams can be used to model facilities where the price of delivery to a client 
depends on the facility and the distance. Of course, this is only one possible distance function, and 
there are many other such functions that are of interest (e.g., multiplicative, additive, etc.). 

When fast proximity and small space is not possible. Consider a set of segments in the 
plane, and we are interested in the nearest segment to a query point. Given n such segments and n 
such query points, this is an extension of Hopcroft's problem, which requires only to decide if there 
is any of the given points on any of the segments. There are lower bounds (in reasonable models) 
that show that Hopcroft's problem cannot be solved faster than 0(n^'^) time [Eri96]. This implies 
that no multiplicative-error approximation for proximity search in this case is possible, if one insists 
on near linear preprocessing, and logarithmic query time. 

When is fast ANN possible. So, consider a set of geometric objects where each one of them 
induces a natural distance function, measuring how far a point in space is from this object. Given 
such a collection of functions, the nearest neighbor for a query point is simply the function that 
defines the lower envelope "above" the query point (i.e., the object closest to the query point under its 
distance function) . Clearly, this approach allows a generalization of the proximity search problem. 
In particular, the above question becomes, for what classes of functions, can the lower envelope be 
approximated up to (1 + e)-multiplicative error, in logarithmic time? Here the preprocessing space 
used by the data structure should be near linear. 

1.1. Our results 

We characterize the conditions that are sufficient to approximate efficiently the minimization dia- 
gram of functions. Using this framework, one can quickly, approximately evaluate the lower envelope 
for large classes of functions that arise naturally from proximity problems. Our data-structure can 
be constructed in near linear time, uses near linear space, and answers proximity queries in loga- 
rithmic time (in constant dimension). Our framework is quite general and should be applicable to 
many distance functions, and in particular we present the following specific cases where the new 
data-structure can be used: 

(A) Multiplicative Voronoi diagrams. Given a set of points P, where the ith point pj has 
associated weight U)j > 0, for i = 1, . . . ,n, consider the functions /j(q) = Wi ||q — pj||. The 
minimization diagram for this set of functions, corresponds to the multiplicative weighted 
Voronoi diagram of the points. The approach of Arya and Malamatos [AM02] to construct 
AVD's using WSPD's fails for this problem, as that construction relies on the triangle inequality 
that the regular Euclidean distance posseses, which does not hold in this case. 



We provide a near linear space AVD construction for this case. We are unaware of any previous 
results on AVD for multiplicatively weighted Voronoi diagrams. 

(B) Minkowski norms of fat convex bodies. Given a bounded symmetric convex body C 
centered at the origin, it defines a natural metric; that is, for points u and v their distance, 
as induced by C, denoted by ||u — v||(^, is the minimum x such that xC + u contains v. So, 
given a set of n data points P = {pi, . . . , Pn} and n centrally symmetric and bounded convex 
bodies Ci, . . . , C^, we define /i(q) = ||pi — qllc-., for i = 1, . . . , n. Since each point induces a 
distance by a different convex body, this collection no longer defines a metric, and this makes 
the problem significantly more challenging. In particular, existing techniques for AVD and 
ANN cannot be readily applied. Intuitively, the fatness of the associated convex bodies turns 
out to be sufficient to approximate the associated distance function, see Section 5.2. The 
negative example for the case of segments presented above, indicates that this condition is also 
necessary. 

(C) Nearest furthest-neighbor. Consider a situation where the given input is uncertain; specif- 
ically, for the ith point we are given a set of points Pj C IR where it might lie (the reader 
might consider the case where the rth point randomly chooses its location out of the points of 
Pj). There is a growing interest in how to handle such inputs, as real world measurements are 
fraught with uncertainty, see [DRS09, Agg09, AESZ12, AAH'^IS] and references therein. In 
particular, in the worst case, the distance of the query point q to the ith point, is the distance 
from q to the furthest-neighbor of q in Pj; that is, -7^j(q) = maxpgp. ||q— p||. Thus, in the 
worst case, the nearest point to the query is -F(q) = miuj J^j(q). Using our framework we can 
approximate this function efficiently, using space 0{n), and providing logarithmic query time. 
Note, that surprisingly, the space requirement is independent of the original input size, and 
only depends on the number of uncertain points. 

Paper organization. In Section 2 we define our framework and prove some basic properties. 

Since we are trying to make our framework as inclusive as possible, its description is somewhat 
abstract. In Section 4, we describe the construction of the AVD and its associated data-structure. 
We describe in Section 5 some specific cases where the new AVD construction can be used. We 
conclude in Section 6. 

2. Preliminaries 

For the sake of simplicity of exposition, throughout the paper we assume that all the "action" takes 
place in the unit cube [0, l]'^. Among other things this implies that all the queries are in this region. 
This can always be guaranteed by an appropriate scaling and translation of space. The scaling and 
translation, along with the conditions on functions in our framework, implies that outside the unit 
cube the approximation to the lower envelope can be obtained in constant time. 

2.1. Informal description of the technique 

Consider n points in the plane pi, . . . , p„, where the "distance" from the ith point to a query q, is 
the minimum scaling of an ellipse £« (centered at pj), till it covers q, and let /» denote this distance 
function. Assume that these ellipses are fat. Clearly each function /j defines a deformed cone. 
Given a query point q G IR , we are interested in the first function graph being hit by a vertical ray 
shoot upward from (q, 0). In particular, let /min(q) = niin /j(q) be the minimization diagram of 

i=l,...,n 

these functions. 
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As a first step to computing /min(q), consider the decision version of 
this problem. Given a value r, we are interested in deciding if /min(q) < "''■ 
That is, we want to decide if q E Uj(Pi + t^i)- Of course, this is by itself 
a computationally expensive task, and as such we satisfy ourselves with an 
approximate decision to this procedure. Formally, we replace every ellipse 
by a collection of grid cells (of the right resolution) , such that approximately 
it is enough to decide if the query point lies inside any of these grid cells - if 
it does, we know that /min(q) < (1 + £)'"i otherwise /min(q) > '''• Of course, 
as depicted in the right, since the ellipses are of different sizes, the grid cells 
generated for each ellipse might belong to different resolutions, and might be of different sizes. 
Nevertheless, one can perform this point-location query among the marked grid squares quickly 
using a compressed quadtree. 

If we were interested only in the case where /min(q) is guaranteed to be in some interval 
[a,/3], then the problem would be easily solvable. Indeed, build a sequence of the above deciders 
Pi, . . . ,Pm) where Pj is for the distance (1 + e)'a, and ra = logi^£(/3/a). Clearly, doing a binary 
search over these deciders with the query point would resolve the distance query. 

Sketchable. Unfortunately, in general, there is no such guarantee - which 

makes the problem significantly more challenging. Fortunately, for truly "large" 

distances a collection of such ellipses looks like a constant number of ellipse (at 

least in the approximate case). In the example of the figure above, for large 

enough distance, the ellipses looks like a single ellipse, as demonstrated in the 

figure on the right. Slight more formally, if IJi(Pi + ^£j) is connected, then the set IJi(Pj + R^i) 

can be (1 + e)-approximated by a constant number of these ellipses, if -R > n{nr/e). A family 

of functions having this property is sketchable. This suggests the problem is easy for very large 

distances. 




Critical values to search over. The above suggests that connectivity is the underlying property 
that enables us to simplify and replace a large set of ellipses, by a few ellipses, if we are looking at 
them from sufficiently far. This implies that the critical values when the level-set of the functions 
changes its connectivity are the values we should search over during the nearest neighbor search. 
Specifically, let rj be the minimal r when the set IJfc=i(Pfc + ^i^k) has n — i connected components, 
and let ri < r2 < • • • < r„ be the resulting sequence. Using the above decision procedure, and 
a binary search, we can find the index j, such that ?'j/min(q) < ^j+i- Furthermore, the decision 
procedure for the distance r^, reports which connected components of Ufc=i(Pfc +^j^k) contains 
the query point q. Assume this connected components is formed by the first t functions; that is, 
Ufc=i(Pfc + fj^k) is connected and contains q. There are two possibilities: 

(A) If /min(q) S I"]-, Ca{t/e)rj , then a binary search with the decision procedure would approxi- 
mation /min(q), where Ca is some constant. 

(B) If /min(q) > {'t/£)^j then this whole cluster of functions can be sketched and replaced by 
constant number of representative functions, and the nearest-neighbor search can now resolve 
directly by checking for each function in the sketch, what is the distance of the query point 
from it. 



2.1.1. Challenges 

There are several challenges in realizing the above scheme: 



(A) We are interested in more general distance functions. To this end, we carefully formalize 
what conditions the underlying distance functions induced by each point has to fulfill so 
that our framework applies. 

(B) The above scheme requires (roughly) quadratic space to be realized. To reduce the space 
to near linear, we need be more aggressive about replacing clusters of points/functions by 
sketches. To this end, we replace our global scheme by a recursive scheme that starts with the 
"median" critical value, and fork the search at this value using the decision procedure. Now, 
when continuing the search above this value, we replace every cluster (at this resolution) by 
its sketch. 

(C) Computing this "median" value directly is too expensive. Instead we randomly select a func- 
tion, we compute the connectivity radius of this single distance function with the remaining 
functions. With good probability this value turns out to be good. 

(D) We need to be very careful to avoid accumulation in the error as we replace clusters by 
sketches. 

2.2. Notations and basic definitions 

Given q G IR and P C IR a non-empty closed set, the distance of q to P is d(q, P) = min ||q — x||. 

For a number ^ > 0, the grid of side- length £, denoted by G^, is the natural tiling of IR , with 
cubes of side- length £ (i.e. with a vertex at the origin). A cube D is canonical if it belongs to 
G^, ^ is a power of 2, and D C [0, 1] . Informally, a canonical cube (or cell) is a region that might 
correspond to a cell in a quadtree having the unit cube as the root region. 

Definition 2.1. To approximate a set X C [0, 1]"', up to distance r, consider the set G~r{X) of all 
the canonical grid cells of Gi that have a non-empty intersection with X, where £ = 2L°^2(,^/v )J _ 
Let UG~r.(X) = UneG- (X) ^' denote the union of cubes of G~,.(X). 

Observe that X C uG~r{X) Q X (B B(0, r), where © denotes the Minkowski sum, and B(0,r) is 
the ball of radius r centered at the origin. 

Definition 2.2. For £ > and a function f : IR'^ — )• IR, the £ sublevel set of f is the set 
/:<^ = |p e IR'^ f{p) <£>. For a set of functions T , let T^i = U/eJ-/d^- 

Definition 2.3. Given a function f and q G IR their distance^ is d(q, /) = /(q). Given two 
functions f and g, their distance j d(/, 5) is the minimum I > such that f-iiOg-^i 7^ 0. Similarly, 
for two sets of function, T and Q, their distance^- is 

<d(-7^,^) = min d(/,5'). 

Example 2.4. To decipher these somewhat cryptic definitions, the reader might want to consider 
the standard settings of regular Voronoi diagrams. Here, we have a set P of n points. The ith point 
Pi G P induces the natural function /i(q) = ||q— Pi||. We have: 

(A) The graph of /j in IR "*" is a cone "opening upwards" with an apex at (pj, 0). 

(B) The £ sublevel set of fi (i.e., {fi)^i) is a ball of radius £ centered at pj. 

(C) The distance/ of q from fi is the Euclidean distance between q and pj. 

(D) Consider two subsets of points X, y C P and let J-x and Ty be the corresponding sets of 
functions. The distance/ £ = d(J^x, •>^y) is the minimum radius of balls centered at points 
of X and Y , such that there are two balls from the two sets that intersect; that is, £ is half 
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the minimum distance between a point of X and a point of Y . In particular, if the union 
of baUs of radius t centered at X is connected i.e. (J-x)-<£ is connected, and similarly 
for y, then (-FxUJ-y)-<£ is connected. This is the critical value where two connected 
components of the sublevel set merge. 

The distance/ function behaves to some extent like a distance function: (i) d(/, ^f) always exists, 
and (ii) (symmetry) (d{f,g) = <d{g,f), Also, we have f^^i^f^g) 7^ 0- We extend the above definition 
to sets of functions. Note that the triangle inequality does not hold for d(-, •). 

Observation 2.5. Suppose that f and g are two functions such that d(/, 5) > and q G H . 
Then, max(d(q, /) , d(q, g)) > d(/, g) . 

Definition 2.6. Let Bi,B2, ■ ■ ■ , Bm be n connected, nonempty sets in IR . This collection of sets 
is connected if Uj-Bj is connected. 

2.2.1. Sketches 

A key idea underlying our approach is that is that any set of functions of interest should look like 
a single (or a small number of functions) from "far" enough. Indeed, given a set of points P C IR , 
they look like a single point (as far as distance), if the distance from CT-L{P) is at least 2diam(P) /e. 

Definition 2.7 (cl(.^)). Given a set of functions Q, if Q contains a single function then the con- 
nectivity level cl(^) is 0; otherwise, it is the minimum £ > 0, such that the collection of sets f-^i 
for f £ G is connected, see Definition 2. 6. 

Remark 2.8. It follows from Definition 2.7 that at level i = c\{G), each of the sets /-<^ for f G G 
are nonempty and connected and further their union G-^£ is also connected. This can be relaxed to 
require that the intersection graph of the sets f-^e for f £ G is connected (this also implies they are 
nonempty). Notice that, if at level £, the sublevel sets are connected, then the relaxed definition 
is equivalent to Definition 2.7. However, the relaxed definition introduces more technical baggage, 
and for all the interesting applications we have, the sublevel sets f~^y are connected at all levels y 
they are nonempty. Therefore, in the interest of brevity, and to keep the presentation simple, we 
mandate that the sublevel sets be connected at i. In fact, it would not harm to assume that the 
sublevel sets are connected whenever nonempty. 

Definition 2.9. Given a set of functions G and 5 > 0,yo >0, a (5, yo)-sketch for G is a (hopefully 
small) subset H '^ G, such that G<y ^ T~l--<(i+s)y, for all y > yo. 

It is easy to see that for any ^, (5 > 0, yo > 0, if ^ C ^ is a {5, yo)-sketch, then for any 6' > 5,y'Q > 
yoi "H' 3 "H it is true that Ti' is a {5' , yQ)-sketch for G- Trivially, for any 5 > 0, yo > 0, it is true that 
Ti = G ^s a (5, yo)-sketch. 

2.3. Conditions on the functions 

We require that the set of functions under consideration satisfy the following conditions. 

(PI) Compactness. For any y > and i = 1, . . . , n, the set (/«)_<„ is compact. 

(P2) Bounded gro'wth. For any f £ T, there is a function Xj : H"*" — )• IR"*", called the growth 
function, such that for any y > and e > 0, if f^y / 0, then A/(y) > d\avn{f^y) /(,, where 
C, is an absolute constant, the growth constant, depending only on the family of functions 



and not on n and such that if q S H with d(q, /-<y) < eXf{y), then /(q) < (1 + e)y- This 
is equivalent to f^y © B(0, £Xf{y)) C /^(x+e)?;) where B(u, r) is the ball of radius r centered 
at u. 
(P3) Existence of a sketch. Given 6 > and a subset G '!Z J-, there is a 7^ C ^ with 
|-H| = 0('l/J^=k') and yo = C'(cl(g)(|a|/<^)^''') such that, V. is an ((5, yo)-sketch, where c^k is 
some positive integer constant that depends on the given family of functions. 

We also require some straightforward properties from the computation model: 

(CI) Vq G K and I < i < n, the value /i(q) = d(q, fi) is computable in 0(1) time. 

(C2) For any y > 0, r > and i, the set of grid cells approximating the sublevel set {fi)^^ °f /«' 

that is (/i)-<„ ~r ~ ^~r {{fi)-<y ) (see Definition 2.1), is computable in linear time in its size. 
(C3) For any fi, fj £ J^,l < i,j < n the distance/ d(/i, fj) is computable in 0(1) time. 

We also assume that the growth function Xfj.\(y) from Condition (P2) be in fact computable 
easily i.e. in 0(1) time. 

Remark 2.10. We will use Condition (C2) for a given y and i only for r at least Q,(^eX/f.\{y)^ i.e. 
we will use a grid on the sublevel set at a low enough resolution typically e times its growth function 
value at that point, which by Condition (C2) is also Olediaml (/j)^ 1 1. As such the number of 
grid cells in the grid used is 0{\/e'^). 

2.3.1. Properties 

The following are basic properties that the functions under consideration have. Since these proper- 
ties are straightforward but their proof is somewhat tedious, we delegate their proof to Appendix B. 
In the following, let J-" be a set of functions that satisfy the conditions above. 

(LI) For any f £ J^, either f^o = or /-^o consists of a single point. (See Lemma B.lpsi.) 
(L2) If cl(^) = for any non-empty subset Q then \Q\ = 1. (See Definition 2.7 and Observa- 
tion B.2p32.) 
(L3) Let f £ G and y > 0. For any u,v e f^y, we have uv C ^^(i+(/2)y, where uv denotes the 

segment joining u to v. (See Lemma B.3p32.) 
(L4) Let Ai, . . . ,Am C IR be compact connected sets, uv be a segment such that uv n ^j / 

0, for i = 1, . . . , /c and uv C {Jj^^^Ai. Then, the sets Ai,...,Ak are connected. (See 

Lemma B.4p32.) 
(L5) For any Ti(^Q(^T,6>0 and y > 0, such that 7^ is a {6, y)-sketch for G, we have that, 

din) <{1 + S){1 + C/2) max(y, c\{g)). (See Lemma B.5p32.) 
(L6) Let "H C C/ C J^, such that ?^ is a (5, yo)-sketch for G for some 5 > and yo ^ 0. Let 

q be a point such that d(q, ^) > yo. Then we have that d(q, ^) < (1 + 5)<d{q,Q). (See 

Lemma B.6p33.) 

2.3.2. Computing the connectivity level 

We implicitly assume that the above relevant quantities can be computed efficiently. For example 
given some 6 > 0, and yo as per the bound in condition (P3), a (5, yo)-sketch can be computed in 
time 0(|^| 75'^='') time. We also assume that cl(^) can be computed efficiently without resorting 
to the "brute force" method. The brute force method computes the individual distance/ of the 
functions and then computes a MST on the graph defined by vertices as the functions and edge 
lengths as their distance/. Then cl(^) is the longest edge of this MST. 



3. Summary of results 

Our main result is the following, the details of which are delegated to Section 4. 

Theorem 3.1. Let F he a set of n functions in IR that complies with our assumptions, see Sec- 
tion 2.3, and has sketch constant Cgk > d. Then, one can build a data- structure to answer ANN for 
this set of functions, with the following properties: 

(A) The query time is 0(logn + l/e'^='^). 

(B) The preprocessing time is 0(ne~^'^='^ log '^'''^'^ n) . 

(C) The space used is 0(ne~'^~^~'^='' log^ n) . 

One can transform the data-structure into an AVD, and in the process improve the query time 
(the space requirement slightly deteriorates). See Section 4 for details. 

Corollary 3.2. Let T be a set of n functions in IR that complies with our assumptions, see 
Section 2.3, and has sketch constant Csk > d. Then, one can build a data- structure to answer ANN 
for this set of functions, with the following properties: 

(A) The improved query time is O(logn). 

(B) The preprocessing time is 0[n/e^^^'log '^''^^ n). 

(C) The space used is S = 0[n/e^^' log rij. 

In particular, we can compute an AVD of complexity 0{S) for the given functions. That is, one 
can compute a space decomposition, such that every region has a single function associated with it, 
and for any point in this region, this function is the (1 + e)-ANN among the functions of J- . Here, 
a region is either a cube, or the set difference of two cubes. 

3.1. Distance functions for which the framework apphes 

3.1.1. Multiplicative distance functions with additive offsets 

We are given n points in IR , where the point pj has weight vui > 0, and an offset aj > associated 
with it, for i = 1, . . . ,n. The multiplicative distance with offset induced by the ith point is 
/j(q) = Wi ||q — Pill + Oj. Li Section 5.1 we prove that these distance functions comply with the 
conditions of Section 2.3, and in particular we get the following result. 

Theorem 3.3. Consider a set P ofn points in H , where the ith point pj has additive weight Oj > 
and multiplicative weight Wi > 0. The ith point induces the additive/multiplicative distance function 
/j(q) = iMj ||q — Pill + Oj. Then one can compute a (1 + e)-AVD for these distance functions, with 
near linear space complexity, and logarithmic query time. See Theorem 3. Ipg for the exact bounds. 

3.1.2. Scaling distance 

Somewhat imprecisely, a connected body O centered at a point p is a-rounded fat if it is a-fat (that 
is, there is radius r such that ba\\{p,r) C O C ball(p, ar)), and from any point p on the boundary of 
O the "cone" C'H(ball(p, r) U p) is contained inside O (i.e., every boundary point sees a large fraction 
of the "center" of the object). We also assume that the boundary of each object O has constant 
complexity. 

For such an object, its scaling distance to a point q from O is the minimum t, such that 
q G to (where the scaling is done around its center p). Given n a-rounded fat objects, it is natural 
to ask for the Voronoi diagram induced by their scaling distance. 

These natural distance functions induced by such a set of objects complies with the framework 
of Section 2.3, see Section 5.2 for details. As such, we get the following result. 
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Theorem 3.4. Consider a set O of a-rounded fat objects in M , for some constant a. Then one 
can compute a (1 + £)-AVD for the scaling distance functions induced by O, with near linear space 
complexity, and logarithmic query time. See Theorem S.lpg and Corollary 3.2 for the exact bounds. 

3.1.3. Nearest furthest-neighbor 

For a set of points S C ]R and a point q, the furthest-neighbor distance of q from S, is 
J-s(q) = maxsgs II q~ s||; that is, it is the furthest one might have to travel from q to arrive to a 
point of S. For example, S might be the set of locations of facilities, where it is known that one of 
them is always open, and one is interested in the worst case distance a client has to travel to reach 
an open facility. The function J^s(-) is known as the furthest-neighbor Voronoi diagram, and 
while its worst case combinatorial complexity is similar to the regular Voronoi diagram, it can be 
approximated using constant size representation (in low dimensions), see [Har99]. 

Given n sets of points Pi, . . . , P„ in IR , we are interested in the distance function T{q) = 
miuj J^j(q), where J-i(q) = J-'p. (q). This quantity arises naturally when one tries to model uncer- 
tainty [AAH"'"13]; indeed, let Pj be the set of possible locations of the ith point (i.e., the location of 
the ith point is chosen randomly, somehow, from the set Pj). Thus, -7^i(q) is the worst case distance 
to the ith point, and J-{q) is the worst-case nearest neighbor distance to the random point-set 
generated by picking the ith point from Pj, for i = l,...,n. We refer to J-{-) as the nearest 
furthest-neighbor distance, and we are interested in approximating it. 

We prove in Section 5.3 that the distance functions J-"i, . . . ,J-'n comply with the conditions of 
the framework, and we get the following result. 

Theorem 3.5. Civen n point sets Pi,...,P„ in IR with a total of m points, and a parameter 
e > 0, one can preprocess the points into an AVD, of size 0{n), for the nearest furthest-neighbor 
distance defined by these point sets. One can now answer (1 + e)- approximate A/A/ queries for this 
distance in O(logn) time. (Note, that the space and query time used, depend only on n, and not 
on the input size.) 

4. Constructing the AVD 

The input is a set T oi n functions satisfying the conditions of Section 2.3, and a number < 
e < 1. We preprocess F, such that given a query point q one can compute a f £ F, where 
d(q,^)<d(q,/)<(l + e)d(q,^). 

4.1. Building blocks 

4.1.1. Near neighbor 

Given a set of functions G, a real number a > 0, and a parameter e > 0, a near-neighbor data- 
structure Vnear = ^nr(^, £, «) Can decide (approximately) if a point has distance/ larger or smaller 
than a. Formally, for a query point q a near-neighbor query answers yes if d(q, Q) < a, and no if 

d(q, ^) > (1 + e)a. It can return either answer if d(q, t/) g ( a, (1 + e)a . If it returns yes, then it 

also returns a function f £ Q such that d(q, /)<(! + e)a. The query time of this data-structure 
is denoted by T<{m), where m = \Q\. 

Lemma 4.1. Given a set of m functions Q ^ T, a > t) and e > 0. One can construct a data 
structure (which is a compressed quadtree), of size 0(m/e'*), in 0[m£~'^\og{m/e)\ time, such that 
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given any query point q G H one can answer a (1 + e) -approximate near-neighbor query for the 
distance a, in time T<{m) = 0{\og{m/£)). 

Proof: For each f G G consider the canonical grid set G~r/ (f^a), where rj = eA/(a) > ediam(/-^Q,) /(", 
where Aj(-),C are the growth function and the growth constant, see (P2). The sublevel set of in- 
terest is G^a and its approximation is C = Ufeg ^r^rfif^a), as the bounded growth condition (P2) 
imphes that f^a ^ G~r.^. (/-^q) C /^(i+e)^- The set of canonical cubes C can be stored in a com- 
pressed quadtree T, and given a query point we can decide if a point is covered by some cube of C 
by performing a point location query in T- 

By Remark 2.10, |G~rf (/-^q)! = 0(e~''). As such, the total number of canonical cubes in C is 
0{m/e^, and the compressed quadtree for storing them can be computed in O {me^ \og{m / e)^ 
time [Harll]. 

We mark a cell of the resulting quadtree by the function whose sublevel set it arose from (ties 
can be resolved arbitrarily). During query, if q is found in one of the cells we return yes and the 
function associated with the cell, otherwise we return no. 

If we have that d(q, Q) < a, then the query point q will be found in one of the marked cells, 
since they cover G-<a- As such, the query will return yes. Moreover, if the query does return a yes, 
then it belongs to a cube of C that is completely covered by ^-<(i+e)a, as desired. ■ 

4.1.2. Interval data structure 

Given a set of functions G, real numbers < a < /?, and e > 0, the interval data structure 
returns for a query point q, one of the following: 

(A) If d(q, G) G Oi, P , then it returns a function g £ G such that d(q, g) < {I + e)d(q, G)- It 
might also return such a function for values outside this interval. 

(B) "d(q, G) < a". In this case it returns a function g £ G such that <d{q,g) < a. 

(C) "d(q,g)>/3". 

The time to perform an interval query is denoted by Tr{m, a, /3). 

Lemma 4.2. Given a set of m functions G, o,n interval [a, (3] and an approximation parame- 
ter T > 0, one can construct an interval data structure of size O (mr"*^"^ log(4/3/a)), in time 
0{mT~'^~^ log(4/3/a) log(m/r)), such that given a query point q one can answer {1+t) -approximate 

mlog(4/3/a) 

r 

Proof: Using Lemma 4.1, build a (1 + T/4)-near neighbor data-structure Dj for G, for distance 
Ti = (a/2)(l + r/4)*, for i = 0, ...,L= logi+^/4(4/3/a) = 0(r-Mog(4/3/a)). Clearly, an interval 
query can be answered in three stages: 

(A) Perform a point-location query in Dq. If the answer is yes then d(q, ^) < a. We can also 
return a function g £ G with d(q, g) < a. 

(B) Similarly, perform a point-location query in D^. If the answer is no then d(q, G) > (3 and 
we are done. 



nearest neighbor query for the distances in the interval [a, /3], in time Tr{m, a, /3, /) = O I log 



(C) It must be that d(q, G) £ 



, ri+i 



for some i. Find this i by performing a binary search 
on the data-structures Dq, . . . , D^, for the first i such that Dj returns no, but Dj+i returns 
yes. Clearly, Dj+i provides us with the desired (1 + r/4)^-ANN to the query point. 

To get the improved query time, observe that we can overlay these compressed quadtrees 
Do, . . . , D/, into a single quadtree. For every leaf (or compressed node) of this quadtree we com- 
pute the original node with the lowest value covering this node. Clearly, finding the desired dis- 
tance can now be resolved by a single point-location query in this overlay of quadtrees. The total 
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size of these quadtrees is 5 = O^L^m/r'^)), and the total time to compute these quadtrees is 
Ti = 0(L(m/r'^) log(m/r)), and the time to compute their overlay is 0{S log L). The time to 
perform a point-location query in the overlayed quadtree is 0{logS). ■ 

Lemma 4.2 readily implies that if somehow a priori we know the nearest neighbor distance j lies 
in an interval of values of polynomial spread, then we would get the desired data-structure by just 
using Lemma 4.2. To overcome this unbounded spread problem, we would first argue that, under 
our assumptions, there are only linear number of intervals where interesting things happen to the 
distance^ function. 

4.1.3. Connected components of the sublevel sets 

Given a finite set X and a partition of it into disjoint sets X = Xi U • • • U Xj., let this partition be 
denoted by {Xi, . . . , Xk) ■^. For 1 < i < A;, each Xi is a part of the partition. 

Definition 4.3. For two partitions Pa = (^i, • • • , ^k) x ^'^^ ^B = {Bi, ■ ■ ■ , Bi)-^ of the same set 
X, Pb is a refinement of Pa, denoted by Pb E Pa, if for any Bi there exists a set Aj., such that 
Bi C Aj.. In the other direction, Pa is a coarsening of Pb- 

Observation 4.4. Given partitions H, H of a finite set X, z/H C H then |H| < |n|. 

Definition 4.5. Given partitions 11 = {Xi, . . . ,Xk)x E ^ = (^i) • • • T-^k')x' ^^^ '/'(n, S, i) be the 
function that return the set of indices of sets in 11 whose union is X'^ £ 3. 

Observation 4.6. Given partitions II Q 3 of a set X with n elements. The partition function 
(^(n,H,-) can be computed in 0{n) time. For any 1 <i < |H|, the set 0(11, H,i) can be returned in 
0( |(/)(n, H,z)|) time, and its size can be returned in 0(1) time. 

Definition 4.7. For Q C J^ and i > 0, consider the intersection graph of the sets f~^i, for all 
f £ G. Each connected component is a cluster of Q at level i. And the partition of Q by these 
clusters, denoted by C{G,i), is the i?-clustering ofQ. 

The values i at which the ^-clustering of J^ changes are, intuitively, the critical values when 
the sublevel set of J- changes and which infiuence the AVD. These values are critical in trying to 
decompose the nearest neighbor search on J^ into a search on smaller sets. 

Observation 4.8. If < a < b then C{g,a) Q C{Q,b). 

The following lemma testifies that we can approximate the ^-clustering quickly, for any number 

L 

Lemma 4.9. Given Q ^ J- , i > Q, and e > 0, one can compute, in 0(^log(m/e)) time, a 
partition ^ = ^e(^,£), such that Q{Q,t) E ^ E C{Q,{l + e)t), where m = \Q\. 

Proof: For each f £ G, tile the sublevel sets {f)^£ by canonical cubes of small enough diameter, 
such that bounded growth condition (P2) assures all cubes are inside (/)^(i+£)^- To this end, for 

f £ G, set rj = eXf{£) > ed\am{f^e) /(, and compute the set C/ = Ufeg( G~»-/ ((/)-<£) )' ^^^ 
Definition 2.1. It is easy to verify that we have that 

{G)^i c uCf c (g)^(i+,),. (1) 
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By assumption, we have that \Cf\ = 0(1/ e'^), and the total number of canonical cubes in all the 
sets Cf for / G ^ is 0{m/e ). We throw all these canonical cubes into a compressed quadtree, 
this takes 0(y{m/£'^)\og{m/e)^ time. Here, every node of the compressed quadtree is marked if it 
belongs to some of these sets, and if so, to which of the sets. Two sets UC/ and UCg intersect, if 
and only if there are two canonical cubes, in these two sets, such that they overlap; that is, one of 
them is a sub-cube of the other. Initialize a union-find data-structure, and traverse the compressed 
quadtree using DFS, keeping track of the current connected component, and performing a union 
operation whenever encountering a marked node (i.e., all the canonical nodes associated with it, 
are unionized into the current connected component). Finally, we perform a union operation for 
all the cells in Cf, for all / G ^. Clearly, this results in the desired connected components of the 
intersection graph of UC/ (note, that we consider two sets as intersecting only if their interiors 
intersect). Translating each such connected set of canonical cubes back to the functions that gave 
rise to them, results in the desired partition. ■ 

Remark 4.10. The partition ^ computed by Lemma 4-9 is monotone, that is, for i < i' and 
e < e' , we have ^e{Q-,^) E ^e'{QA')- Moreover, for each cluster C G ^£(^,^), we have that 
c\{C) < {! + £)£. 

4.1.4. Computing a splitting distance 

Definition 4.11. Given a partition ^ = ^£{Q,i) of Q , with m = \^\ clusters, a distance x is a 
splitting distance i/m/4 < |^i(^,x/4)| and |^i(^,x)| < (7/8)m. 

Lemma 4.12. Given a partition ^ = ^e{GA) of Q, one can compute a splitting distance for it, in 
expected 0(n(logn + t)) time, where n = \Q\ and t is the maximum cluster size in ^ . 

Proof: For each cluster C G ^, let re be its distance/ from all the functions in ^ \ C; that is 
re = minjg(7minggg\^c' d(/, 5). Note that re > L Now, let ri < r2 < • • • < r^ be these distance/ 
distances for the m clusters of ^. We randomly pick a cluster C ^'^ and compute £' = re for it 
by brute force - computing the distance/ of each function of C with the functions of ^ \ C. 

Let i be the rank of (! = re among ri, . . . , r^- With probability 1/2, we have that ?n,/4 < i < 
(3/4)?7i. If so we have that: 

(A) All the clusters that correspond to rj, . . . , r^ are singletons in the partition ^i{G, ^/4), as 
the distance/ of each one of these clusters is larger than £' . We conclude that | ^1 (^, ^V^) | > 
m/4. 

(B) All the clusters of ^ that correspond to ri,. . . ,rj are contained inside a larger cluster 
of ^i{Q,l') (i.e., they were merged with some other cluster). But then, the number of 
clusters in ^i{G,i') is at most (7/8)m. Indeed, put an edge between such a cluster, to the 
cluster realizing the smallest distance/ with it. This graph has at least e > m/A edges, and 
it is easy to see that each component of size at least 2 in the underlying undirected graph 
has the same number of edges as vertices. As such the number of singleton components 
is at most m — e while the number of components of size at least 2 is at most e/2. It 
follows that the total number of components is at most m — e/2 < 7m/8. Since each such 
component corresponds to a cluster in ^i(^,£') the claim is proved. 

Now, compute ^i{Q,i') and \I'i(^,£'/4) using Lemma 4.9. With probability at least half they 
have the desired sizes, and we are done. Otherwise, we repeat the process. In each iteration we 
spend 0(n(logn + t)) time, and the probability for success is half. As such, in expectation the 
number of rounds needed is constant. ■ 
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Search( ^, T, q ) 
// Q : set of functions 
// T = ^i(a,£) for some value £ 
// Invariant : d(q, G) > IN 
if |T| = 1 then 

return d(q, G) = mhi/gg d(q, /) (*) 

X ^ compute a splitting distance of T, see Lemma 4.12 

// Perform an interval approximate nearest 

// neighbor query on the interval [x/8N, x8N] 

// for the set G, see Lemma 4.2. 

if d(q, G) G x/8N, x8N^ or (1 + |)-ANN found then 

return nearest function found by the 

(1 + e/4)-approximate interval query. 
if d(q, G) < x/8N then 

/ -^ 2-approximate near neighbor query on G 

and distance x/8, see Lemma 4.1. 
Find cluster C G ^i(^,x/4), such that f £ C, 

see Lemma 4.9. 
return Search( C, T[C], q ) 
if d(q, G) > x8N^ then 

return Search( a, ^i(^,xiV), q) (**) 



Figure 4.1: Search algorithm: We are given a query point q, and an approximation parameter 
e > 0. The quantity A^ is a parameter to be specified shortly. Initially, we call this procedure on 
the set of functions J- with T being the partition of J- into singletons (i.e., i = 0). Here, T[C] 
denotes the partition of C induced by the partition T. 

4.2. The search procedure 

4.2.1. An initial "naive" implementation 

The search procedure is presented in Figure 4.1. 

Lemma 4.13. Search(^, T,q) returns a function f gG, such that (d{q, f) < (l + e)d(q, ^). The 
depth of the recursion o/ Search is h = O(logn), where n= \G\- 

Proof: The proof is by induction on the size of T. If |T| = 1, then the function realizing d(q, G) is 
returned, and the claim is true. 

Let X be the computed splitting distance of T. Next, the procedure perform an (1 + e/A)- 
approximate interval nearest-neighbor query for q on the range [x/8 A^, x8N] . If this computed the 
approximate nearest neighbor then we are done. 

Otherwise, it must be that either d(q, G) < x/8N or d(q, G) > 8Nx, and significantly, we know 
which of the two options it is: 

(A) If d(q, G) < x/8N then doing an approximate near-neighbor query on G and distance x/8, 
returns a function f G G such that d(q, /) < x/4. Clearly, the nearest neighbor to q must be 
in the cluster containing / in the partition ^i(^, x/4), and Search recurses on this cluster. 
Now, by induction, the returned ANN is correct. 
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Since x is a splitting distance of T, see Definition 4.11, we have |T| /4 < |^i(^, x/4)| and 
T C \I'i(^, x/4). As sucli, since C is one of the clusters of ^i{G, a^/4), the induced partition 
of C by T (i.e., T[C]), can have at most (1 — 1/4) |T| clusters. 
(B) Otherwise, we have d(q, Q) > x-8N. Since x is a splitting distance, we have that |*I'i(^, x)\ < 
(7/8) |T|, see Definition 4.11. We recurse on Q, and a partition that has fewer clusters, and 
by induction, the returned answer is correct. 

In each step of the recursion, the partition shrunk by at least a fraction of 7/8. As such, after a 
logarithmic number of recursive calls, the procedure is done. ■ 

4.2.2. But where is the beef? Modifying Search to provide fast query time 

The reader might wonder how we are going to get an efficient search algorithm out of Search, as 
the case that T is a single cluster, still requires us to perform a scan on all the functions in this 
cluster and compute their distance/ from the query point q. Note however, we have the invariant 
that the distance of interest is polynomially larger than the connectivity level of each of the clusters 
of T. In particular, precomputing for all the sets of functions such that (*) might be called on, 
their e/8-sketches, and answering the query by computing the distance on the sketches, reduces the 
query time to 0{l/e'^^^ + log n) (assuming that we precomputed all the data-structures used by 
the query process). Indeed, an interval query takes O(logn) time, and there O(logn) such queries. 
The final query on the sketch takes time proportional to the sketch size which is ©(l/e'^*). 

As such, the major challenge is not making the query process fast, but rather building the search 
structure quickly, and arguing that it requires little space. 

4.2.3. Sketching a sketch 

To improve the efficiency of the preprocessing for Search, we are going to use sketches more 
aggressively. Specifically, for each of the clusters of T, we can compute their (5-sketches, for 5 = 
e/(8h) = 0(e/logn), see Lemma 4.13. From this point on, when we manipulate this cluster, we do 
it on its sketch. To make this work set N = n^'^='<, see (P3)p8 and Lemma B.5. 

The only place in the algorithm where we need to compute the sketches, is in (**) in Figure 4.1. 
Specifically, we compute ^i(^,xA^), and for each new cluster C £ 'i'i{Q,xN), we combine all the 
sketches of the clusters D £ T such that D Q C into a single set of functions. We then compute a 6- 
sketch for this set, and this sketch is this cluster from this point on. In particular, the recursive calls 
to Search would send the sketches of the clusters, and not the clusters themselves. Conceptually, 
the recursive call would also pass the minimum distance where the sketches are active - it is easy 
to verify that we use these sketches only at distances that are far away and are thus allowable (i.e., 
the sketches represent the functions they correspond to, well in these distances). 

Importantly, whenever we compute such a new set, we do so for a distance that is bigger by a 
polynomial factor (i.e., A^) than the values used to create the sketches of the clusters being merged. 
Indeed, observe that x > £ and as such xA^ is A^ times bigger than i (an upper bound on the value 
used to compute the input sketches). 

As such, all these sketches are valid, and can be used at this distance (or any larger distance). 
Of course, the quality of the sketch deteriorates. In particular, since the depth of recursion is h, 
the worst quality of any of the sketches created in this process is at most (1 + 6)^ < 1 + e/4. 

Significantly, before using such a sketch, we would shrink it by computing a e/8-sketch of it. 
This would reduce the sketch size to 0(1/8^"^). Note, however, that this still does not help us as 
far as recursion - we must pass the larger (5-sketches in the recursive call of (**). 
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This completes the description of the search procedure. It is still unclear how to precompute 
all the data-structures required during the search. To do that, we need to better understand what 
the search process does. 

4.3. The connectivity tree, and the preprocessing 

Given a set of functions J-", create a tree tracking the connected components of the MST of the 
functions. Formally, initially we start with n singletons (which are the leafs of the tree) that are 
labeled with the value zero, and we store them in a set 3~ of active nodes. Now, we compute for 
each pair of sets of functions X,Y £ 3' the distance/ <d{X, Y), and let X' , Y' be the pair realizing 
the minimum of this quantity. Merge the two sets into a new set Z = X' UY' , create a new node 
for this set having the node for X' and Y' as children, and set its label to be (d{X',Y'). Finally, 
remove X' and Y' from 3" and insert Z into it. Repeat till there is a single element in 3". Clearly, 
the result is a tree that tracks the connected components of the MST. 

To make the presentation consistent, let d~(X, y) be the minimum x such that ^i{X UY,x) 
is connected. Computing d~(X, y) can be done by computing d~(/, (7) for each pair of functions 
separately. This in turn, can be done by first computing a = d(/, g) and observing that r is 
between a/4 and a. In particular, r must be a power of two, so there are only 3 candidate values 
to consider, and which is the right one can be decided using Lemma 4.9. 

So, in the above, we use d~(-, •) instead of d(-, •), and let Ti be the resulting tree. For a value i, 
let L-^ (i) be the set of nodes such that their label is smaller than i, but their parent label is larger 
than i. It is easy to verify that L-^(£) corresponds to ^ = ^i{J^,£); indeed, every cluster C G ^ 
corresponds to a node u G L-^ {£) , such that the set of functions stored in the leaves of the subtree 
of u, denoted by F(n) is C. The following can be easily proved by induction. 

Lemma AAA. Consider a recursive call Search(^, T, q) made during the search algorithm execu- 
tion. Then Q = F(n), and T = < F(t>) v € L-^(£) and v is in the subtree of u>. 
That is, a recursive call 0/ Search corresponds to a subtree ofTi. 

Of course, not all possible subtrees are candidates to be such a recursive call. In particular. 
Search can now be interpreted as working on a subtree T of H, as follows: 

(A) If T is a single node u, then find the closet function to F{u). Using the sketch this can be 
done quickly. 

(B) Otherwise, computes a distance x, such that the number of nodes in the level Lr(x) is 
roughly half the number of leaves of T. 

(C) Using interval data-structure determine if the distance/ d(q, F(T)) is in the range [x/8N, x8N 
If so, we found the desired ANN. 

(D) If d(q, F(T)) > x8N'^ then continue recursively on portion of T above Lt(x). 

(E) If d(q,F(r)) < x/8A^ then we know the node u € Lt(x) such that the ANN belongs to 
F(ti). Continue the search recursively on the subtree of T rooted at u. 

That is. Search breaks T into subtrees, and continues the search recursively on one of the 
subtrees. Significantly, every such subtree has constant fraction of the size of T, and every edge of 
T belongs to a single such subtree. 

The preprocessing now works by precomputing all the data-structures required by Search. 
Of course, the most natural approach would be to precompute Ti, and build the search tree by 
simulating the above recursion on Ti. Fortunately, this is not necessary, we simulate running 
Search, and investigate all the different recursive calls. We thus only use the above Ti in analyzing 
the preprocessing running time. See Figure 4.1pi4. 
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In particular, given a subtree T with m edges, the corresponding partition T would have at 
most m sets. Each such set would have a (5-sketch, and we compute a e/8-sketch for each one 
of these sketches. Namely, the input size here is M = 0{m/5'^'^). Computing the e/8-sketches 
for each one of these sketches reduces the total number of functions to M' = 0{m/£'^^^), and 
takes Ui = 0(M/e"^='') = 0{rn{e5)~'^^^^ time, see Section 2.3.2. Computing the splitting distance, 
using Lemma 4.12, takes U2 = 0{M' log M' + l/e^''^) = 0{m£~^''^\ogm) time. Computing the 
interval data-structure Lemma 4.2 takes U^ = 0(M'e~ log n log M') time, and requires 5*1 = 
0(M'e~'^~^ logn) space. This breaks T into edge disjoint subtrees Ti, . . . ,Tt, and we compute the 
search data-structure for each one of them separately (each one of these subtrees is smaller by a 
constant fraction of the original tree). Finally, we need to compute the J-sketches for the clusters 
sent to the appropriate recursive calls, and this takes Ui = 0{M/6'^''^), by Section 2.3.2. 

Every edge of the tree T gets charged for the amount of work spent in building the top level 
data-structure. That is, the top level amortized work each edge of T has to pay is 



= O ({eSy^^'^ + e-^=k log m + g-'^-i-^^k j^gS ^ ^ ^-2c,k 
= 0(e-2c,kiog2c,k^^^ 

assuming Csk > 2. Since an edge of T gets charged at most O(logn) times by this recursive 
construction, we conclude that the total preprocessing time is 0(ns~'^^''^ log '^^''^ n). 

As for the space, we have by the same argumentation, that each edge requires 0(log n ■ {Si/m)) = 
(e~ * log n). As such, the overall space used by the data-structure is (ne~ ^"^ log n). As 

for the query time, it boils down to O(logn) interval queries, and then scanning one 0(e)-sketch. 
As such, this takes 0(log n + l/e'^='') time. 

4.4. The result 

Restatement of Theorem S.lpg. Let J-" be a set of n functions in ]R that complies with our 
assumptions, see Section 2.3, and has sketch constant Csk > d. Then, one can build a data-structure 
to answer ANN for this set of functions, with the following properties: 

(A) The query time is 0(logn + l/e'^^^). 

(B) The preprocessing time is 0{ne^'^'^''^ log^'^'^''^^ n). 

(C) The space used is 0{n£~'^~^~'^^^ log^ n) . 

Proof: The query time stated above is 0(log n + 1/e'^^''). To get the improved query time, we 
observe that Search, performs a sequence of point-location queries in a sequence of interval near 
neighbor data-structures (i.e., compressed quadtrees), and then it scans a set of functions of size 
©(l/e'^*) to find the ANN. We take all these quadtrees spread through our data-structure, and 
assign them priority, where a quadtree Qi has higher priority than a compressed quadtree Ti if 
Q\ is queried after Q2 for any search query. This defines an acyclic ordering on these compressed 
quadtrees. Overlaying all these compressed quadtrees together, one needs to return for the query 
point, the leaf of the highest priority quadtree that contains the query point. This can be easily 
done by scanning the compressed quadtree, and for every leaf computing the highest priority leaf 
that contains it (observe, that here we are overlaying only the nodes in the compressed quadtrees 
that are marked by some sublevel set - nodes that are empty are ignored). 

A tedious but straightforward induction implies that doing a point-location query in the re- 
sulting quadtree is equivalent to running the search procedure as described above. Once we found 
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the leaf that contains the query point, we scan the sketch associated with this cell, and return the 
computed nearest-neighbor. ■ 

Restatement of Corollary 3.2pg. Let J-" be a set of n functions in IR that complies with our 
assumptions, see Section 2.3, and has sketch constant Cgk > d. Then, one can build a data-structure 
to answer ANN for this set of functions, with the following properties: 

(A) The improved query time is O(logn). 

(B) The preprocessing time is 0[n/e^^' log '^^'^^ n). 

(C) The space used is 5" = 0[n/e'^^^> log^ n). 

In particular, we can compute an AVD of complexity 0{S) for the given functions. That is, one 
can compute a space decomposition, such that every region has a single function associated with it, 
and for any point in this region, this function is the (1 -|- e)-ANN among the functions of J-". Here, 
a region is either a cube, or the set difference of two cubes. 

Proof: We build the data-structure of Theorem 3.1, except that instead of linearly scanning the 
sketch during the query time, we preprocess each such sketch for an exact point-location query; 
that is, we compute the lower envelope of the sketch and preprocess it for vertical ray shooting 
[AE98]. This would require 0(l/e^^'^ space for each such sketch, and the linear scanning that 
takes 0(l/e'^(^)) time, now is replaced by a point-location query that takes 0(log l/e*^'-^)) = 
0(logl/e) = O(logn), as desired. 

As for the second part, observe that every leaf of the compressed quadtree is the set difference 
of two canonical grid cells. The lower envelope of the functions associated with such a leaf, induce 
a partition of this leaf into regions with total complexity 0[l/e^^'). ■ 

5. Applications 

We present some concrete classes of functions that satisfy our framework, and for which we construct 
AVD's efficiently. 

5.1. Multiplicative distance functions with additive offsets 

As a warm-up we present the simpler case of additively offset multiplicative distance functions. The 
results of this section are almost subsumed by more general results in Section 5.2. Here the sublevel 
sets look like expanding balls but there is a time lag before the balls even come into existence i.e. 
sublevel sets are empty up-to a certain level, this corresponds to the additive offsets. In Section 5.2 
the sublevel sets are more general fat bodies but there is no additive offset. The results in the 
present section essentially give an AVD construction of approximate weighted Voronoi diagrams. 
More formally, we are given a set of points P = {pi, . . . , p„}. For i = 1, . . . , n, the point Pi has 
weight Wi > 0, and a constant Oi > associated with it. We define /j(q) = Wi ||q — pj|| -|- Oj. Let 

J" = {/i, . . . , /„}. We have, {fi)^y = for y < a^ and ifi)^^ = bJ^Pj, ^^ for y > Oi. Checking 

conditions (CI) and (C2) is trivial. As for (C3) we have the following easy lemma. 

Lemma 5.1. For any 1 < i,j < n we have 

,,r rx { , X II II WiWj aiWj + ajWi\ 
d(/j, Jj) = max max(ai, aj), \\pi - pj\\ --^ H '— — '- — . 

V ' Wi + Wj Wi + Wj J 
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Proof: The ith. distance function is /i(q) = Wj ||q — pj|| + Oj. As such, for y < niax(ai,aj) either 
(.fi):<y = 0) or {fj)^y = and {fi)-^y n {fj)^y = 0- For y > max(aj,aj), we have 

fi{q)<y =^ Wi\\q- pi\\+ai<y =^ ||q-pj||< 



Wi 



which imphes that q € B( pi, ^^j ; that is, we have {fi)^y = Bf pi, ^^j and {fj)^y = ^[pj, ^i^ 

Now, if Pi = pj then the distance/ distance between the two functions is the minimal value such 
that their sublevel sets are not empty, and this is max(aj,a;j). In particular, the given expression 

( , . ,, ,, WiWj aiWj+ajwA 
a = max maxloj, a,), \\pi — p,- 1 — 

V ' Wi+Wj Wi+ Wj J 

evaluates to max(ai,aj), as desired. 

If Pi 7^ Pj the sublevel sets intersect for the first time when the balls B ( pj , ^ti^ j and B ( pj , ^^°^ j 
touch at a point that belongs to the segment pi pj . Clearly then we have 

,, y — Q-i y — aj 

\\Pi-pj\\ = \ '- =^ WiWj\\pi-pj\\=Wj{y-ai) + Wi{y-aj) 

Wi Wj 

=^ {wj + Wj)y = WiWj II Pi - Pj II + Wjtti + Wiaj 

,, ,, WiWj aiWj + ttjWi 
=^ y = Pi — p,- 1 — 

Wi + Wj Wi + Wj 

Lemma 5.2. Given 1 < i,j < n such that Wi < Wj. Suppose y > max(aj,aj). Then, {fj)^^ CI 
ihha^S^y ^fand only ^f y > "^'^^j^f-lTg/-- ■ 

Proof: For y > max{ai,aj) we have that {fi)^y = B(^pi,^j and {fj)^y = B(^pj,^^j. If 

Pi = Pj then for any y such that ^^^^2!'°' - ^1^ ^^ "^^^^ ^^^^ ^^^^ ^^i^d^y - (/«)^(i+5)y Clearly 
this condition is also necessary. It is easy to verify that this is equivalent to the desired expression. 

Consider the case Pi 7^ pj. Sufficiency: Notice that for any u G B( pj, ^ J^^ j we have ||u — pj|| < 

IIPj — Pjll + ||pj — u|| < IIpj — pjll + ^^°^ by the triangle inequality. Therefore, if - — ^, "' > 

IIPj — Pjll + ^^-^, then B(pj,^^^^j C Bfpj,^^ — ir~^)- This is exactly the stated condition. 
Indeed, by rearrangement, 

y((l + S)/wi - 1/wj) > II Pi - pjll + ai/wi - Oj/wj. 



Necessity: Notice that B( p,-, ^ "-^ I has a boundary point at distance ^ "^ from p,- on the directed 

y J Wj J Wj J 

line from pi to p,- on the other side of p,- as Pi, while B ( pi, ^^~"' I has the intercept of ^^~"' — 



llpi — p,||. For the condition to hold it must be true that - — ^ ,^ "' — ||pi — Pi|| > ^, °^ , which is 
also the stated condition. ■ 

It is easy to see that compactness (PI) and bounded growth (P2) hold for the set of functions 
T (for (P2) we can take the growth function A(j^)(y) = (y — a,j)/wi for y > ai and the growth 
constant C to be 2). The following lemma proves the sketch property (P3). 
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Lemma 5.3. For any Q '^ T and 5 > there is a {5, yo)- sketch T-i C Q with \T-L\ = 1 and yo = 

3c\{g)\g\/5. 

Proof: If \g\ = 1 we can let "H = ^ and the result is easily seen to be true. Otherwise, let I = cl(^) 
for brevity. Observe that / > max Oj, as otherwise some {fi)^i = and cannot be part of a 

connected collection of sets. Let \Q\ = m >2 and let ^ = {/i, . . . , fm]-, and assume that we have 
w\ < Wi,l < i < m. We let % = {/i}, the function with the minimum associated weight. We are 
restricted to the range / > a^, 1 < i < rn, so, {fi)^i is the ball B( pj, -^^ j for each 1 <i <m. Since 
Q~ii is connected, it must be true that for 2 < j < m there exist a sequence of distinct indices, 
1 = ii,i2,...,ik-i,ik = j such that B(pi^,^) nB(p,^+,,^^;^) / for 1 < r < /c - L By 
Lemma 5.1 we can write that, 

lln. _ D- II _L ^:t _L °V+1 

►""Jr Kir + 1 ' Wi^ ' Wi , . 



' + 



Wi^ Wi 



Rearranging, 

||p.,-P..,«||<<i + ^)-(^ + ^ 

21 
<—, 

Wl 

as Wl < Wi, for 1 < i < 771. It follows by the triangle inequality and the above, that ||pji — pj^|| < 

m-l)l ^ 2ml 

«)1 — Wl 



Z^r=i ||P*r ~ Pv+i|| S — ^, — S iijr- -Lnus we nave. 



I II ^ 2m/ 
Pi-Pj < , (2) 

Wl 



for j = 1, . . . , m. Let yo = ~3~^ = ^^- Then, for y > yo we have that, 

„ , 2m.l I nU 2ml _t_ _i_ 

oTnL Wl ' Wl -. Mil "*" Wl 

y-^= A. - — z — ' 

Wl Wl 

for m>2. Using Eq. (2) and the above, we have for y > yo since I > ai, 

. IIPi-Pill + ^ ^ IIPi-Pjll + gt 

y ^ 5 ^ s • 



It follows that for y > yo^ 



Wl k;i 



Ipi-p.ll + t ^llPi-P^-ll + t 



y^ s ^ 5 , ri 1 



L + fA. 



t/Jl t(;i V tui Wj ' 

Up p ll + ^_^ 

Wl Wj 

as Wl < Wj for 1 < j < m. Thus, by Lemma 5.2, B( pj, ^^"^ j C B( pi, ^^ — ^^ °^ j , for y > yo and 
therefore by definition, 'H is a (5, yo)-sketch for t/. ■ 
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Figure 5.1: Being a-rounded fat. 

We thus get the fohowing result. 

Restatement of Theorem 3.3pg. Consider a set P of n points in IR'^, where the ith point pj 
has additive weight «« > and multiphcative weight Wi > 0. The ith point induces the addi- 
tive/multiphcative distance function /j(q) = Wi ||q — pj|| +0^. Then one can compute a (l + e)-AVD 
for these distance functions, with near hnear space complexity, and logarithmic query time. See 
Theorem 3.1p9 for the exact bounds. 

5.2. Scaling distance — generalized polytope distances 

Let O C ]R be a compact set homeomorphic to B(0, 1) and containing a "center" point p in its 
interior. Then O is star shaped if for any point v G O the entire segment pv is also in O. Naturally, 
any convex body O with any center p £ O is star shaped. The t-scaling of O with a center p is 
the set to = U(v-/j) + p \/eo\. 

Given a star shaped object O with a center p, the scaling distance of a point q from O is 
the minimum t, such that p G tO, and let -Fo(q) denote this distance function. Note that, for any 
y > 0, the sublevel set (-Fo)-<„ is the y-scaling of O, that is (-^o)-<y = yO. 

Note, that for a point p € H , if we take O = B(p, 1) with center p, then -Fb(q) = ||p — q||. 
That is, this distance notion is a strict extension of the Euclidean distance. 

Henceforward, for this section, we assume that an object O contains the origin in its interior 
and the origin is the designated center, unless otherwise stated. 

Definition 5.4. Let O C IR be a star shaped object centered at p. We say that O is a-fat if there 
is a number r such that, ba\\{p,r) C O C ba\\{p,ar). 

Definition 5.5. Let O be a star shaped object centered at p. We say that O is a-rounded fat if 
there is a radius r such that, (i) ball(/9, r) C O C ba 1 1 (p,ar) and, (ii) For every point p in the 
boundary of O, the cone C'H{ha\\[p,r) U p), lies within O, see Figure 5.1. 

By definition any a-rounded fat object is also a-fat. However, it is not true that a a-fat object 
is necessarily a'-rounded fat for any a', that is even allowed to depend on a. The following useful 
result is easy to see, also see Figure 5.2 for an illustration. 
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Figure 5.2: A a-fat convex body is a-rounded fat. 

Lemma 5.6. Let O he a a-fat object. If O is convex then O is also a-rounded fat. 

Given a set O = {Oi,02, . . . ,0n} of n star shaped objects, consider the set F oi n scahng 
distance functions, where the zth function, for i = l,...,n is fi = Fq^- We assume that the 
boundary of each object Oi has constant complexity. 

We next argue that F comphes with the framework of Section 2.3. Using standard techniques, 
we can compute the quantities required in conditions (Cl)-(C3)p8 including the diameter of the 
sublevel set diam(yOj) = ydiam(Oj). Also, trivially we have that condition (Pl)p7 is satisfied as 
the sublevel sets are dilations of the Oi and are thus compact by definition. The next few lemmas 
establish that both bounded growth (P2) and the sketch property (P3) are also true, if the objects 
are also a-rounded fat for some constant a. 

Lemma 5.7. Given a > 0, suppose O is a star shaped object that is a-rounded fat. Then for 
any c > 2a and any y > 0,e > we have that yO © B(0, (e/c)diam(yO)) ^ (1 + s)yO; that is, 

(i^o)^,eB(o,(e/c)diam((Fo) J) C (Fo)^(i+,),. 

Proof: Since {Fo)^y = yO we show that yO © B(0, (e/c)diam(yO)) C (1 + e)yO. Let r be the 
radius guaranteed by Definition 5.5 for O. Clearly diam(yO) = ?/diam(0) < 2yar. We show that 
for every p G dyO we have that p-|- 8(0, (e/c)diam(yO)) C [l-\-e)yO ^. Let p G dyO. It is sufficient 
to show that, B(p, {2eyar/c)) C (1 + e)yO. Clearly p' = (1 + e)p G d{l + e)yO. Since the cone, 
C'H{B{p, (1 + e)yr) U p') is in (1 + e)yO, it is clear that the ball of radius, 

II / _ii IIp'II II / _ii IIpII 



" " (1 + e)yr " " yr 

is completely within (1 -\- e)yO, see Figure 5.3. Now, ||p — p'|| = e||p|| > syr, and ||p|| > yr. It 
follows that X > eyr. If we choose c > 2a, the claimed result is easily seen to hold. ■ 



^ Topological arguments can show that for objects homeomorphic to balls, if this is true for boundary points p, 
then for any p G O, p + B(0, (e/c)diam(yO)) C (1 + e)yO. We omit the technical argument here. 
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(^o)^(i+,), = (1 + e)yO 



{\+e)vr 



Figure 5.3: The (1 + e) expansion of yO contains B(p, x). 



By the above lemma we can take the growth function XpoXv) = diaml (-Foi)-<^) = 2/diam(0i) 
and the growth constant, see (P2)p7, for the set of functions Fq^ to be C = c = 2a. If the object 
O is a-fat but not a'-rounded fat for any constant a' > then it may be that its scahng distance 
function grows arbitrarily quickly and thus fails to comply with our framework, see Figure 5.4. 
It is not hard to see that Lemma 5.7 implies that bounded growth (P2) is satisfied for all the 
functions /i, . . . , /„ when the objects under consideration Oi, . . . , On are o-rounded fat. To show 
that condition (P3) is satisfied, is slightly harder. 

Lemma 5.8. Let O he a set of n star shaped objects Oi,02, ■ ■ ■ ,On- Let a > 1 be any constant. 
Suppose that Oi, . . . , 0„ are a-rounded fat. Then, for any 6 > 0, there is a subset X C {1, 2, . . . , n} 
with \T\ = 0{6~'^), such that for all y >0, we have 



U yO,Q\Jil + 5)yOi. 



ie\n\ 



iex 



Moreover, for every i £ I we have that diam(Oi) = r2(maxj diam(Oj)). 

Proof: Recall our convention, that here all the bodies are centered at the origin. Clearly it is 
sufficient to show this for y = 1. Let r, for i = 1,. . . ,n be the radius of the ball satisfying the 
conditions of Definition 5.5 for object Oj. Assume that ri > r2 • • • > r„. If arj < ri for some j, 
then Oj, . . . , On are contained in Oi, we can add 1 to the set I. From now we assume that for each 
1 < j < ?i-, orj > ri i.e. we can ignore the sufficiently small objects. Our index set X is a subset of 
these prefix indices for which, ar-i > ri. As such. 



diam(Oj) > 2r, > -n > 

a a 



4 4 

-^(2ari) > -^ max diamfOj 



d l<i<n 



for any i £ Z. It is easy to see that, Uiefnl ^« — Uiefnl B(0'O!^i) ^ B(0,ari) . We tile the ball 
B(0, ari) with cubes of diameter at most 6ari/c' where c' is a constant that we determine shortly. 
Notice that the number of such cubes is 0{S~'^) . Let C denote the set of these cubes. If cflOj 7^ for 
some 1 < i < n and c S C then we add c to a set A and i to our index set I. Notice that we choose 
at most one object among all objects that might intersect c. Now, UjgM ^« — Uce^^'-' ^^ Uce^l^ 
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O®B(0, fdiam(O)) 




Figure 5.4: The object O is a-fat but not a'-rounded fat. In particular, the point p is in O © 
B(0, (e/c)diam(0)) but not in (l + e)0. In particular, the scaling distance function is discontinuous 
at u. 

covers B(0,ari). Observe, |X| < |^| < \C\ = 0((5~'^). We show that it is possible to choose c' so 
large that, Uce^^c C Uiex(l + '^)0j- Since cnOj / and diam(c) < 5ari/c', c C Oj © B(0, fon/c'). 
We choose d large enough so that 5ari/c' < 5diam(Oj) /c where c = 2a is the constant from 
Lemma 5.7. Then we will have by Lemma 5.7, 

c C Oi © B(0,(5ari/c') C O^ © B(0,(5diam(Oi) /c) 



proving the claim. Now, 



barxjc < 6a ri/c < 



(^a^diam(Oj) 6d\am{Oi) 



2c' 



< 



if c' = caV2 = a^• 



Lemma 5.9. -Let a > 1 be any constant. Let O be a star shaped object that is a-rounded fat, and 
let 6 > 0. Let u € H'^ with ||u|| < 6d\am{0) /c where c = 2a. Then we have that O + u C (1 + 6)0. 

Proof: We have, O + u C O © B(0, ||u||) C O© B(0,(5diam(O) /c) as ||u|| < 5diam(0)/c. The result 
follows by appealing to Lemma 5.7. ■ 

Lemma 5.10. For i = 1, . . . ,n, let Oi be a star shaped object in H centered at a point pj. Let 
O = {Oi,...,On}, P = {pi,...,p„}, and F = |/j 1 < i < n\, where ft = Fq^, for i = 1, . . . ,n. 
For i = 1, . . . ,n, let ri denote the radius of the ball for Oi from Definition 5.5, and let r = maxj r,. 
Then, c\{T) > diam(P)/(2nar). 

Proof: The claim is trivially true if diam(P) = 0, i.e. all the points pj are the same. Let I = c\{J-), for 
brevity. As we have {fi)^i = lOi, where the scaling is done around its center pj, it follows that the 
sets lOi for i = 1, . . . , n are connected. Since lOi C B(p.j, lari) C B(pj, lar) it is easy to see that the 
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balls B{pi,lar) for i = 1, . . . ,n are also connected. Let u,v G P be such that ||u — v|| = diam(P). 
There is a sequence of distinct ii,...,ik G {!,..., n} such that u = Pii,v = pi^ and we have 
B(pi^,/ar)nB(pj^^j,/ar) / for 1 < r < A; - 1. It follows that ||Pv - Pv+J < 2/ar, 1 <r<k-l. 
By the triangle inequality, 



fc-i 



diam(P) = ||u-v|| = ||pi, - pij| < J^||pi, - Pi,+i | 

r=l 
fc-1 
< ^ 2;Qr = 2{k - l)lar < 2nlar, 



r=l 
thus proving the claim. ■ 

We can now show that condition (P3)p8 holds for the Fq^- 

Lemma 5.11. Consider the setting of Lemma 5.10. Given 5 > Q, there is a index set X C 
{1, . . . ,n} with \T\ = 0(^5~'^\ and y^ = 0{l ■ n/5) such that the functions Ifj j £l > form a 
{6, Ho) -sketch, where I = cl(J^). 

Proof: We provide a sketch of the proof as details are easy but tedious. For each 1 < i,j < n 
we consider the set of objects Oij = Oi + pj — pj, i.e. Oij is Oi translated so that it is centered 
at Pj. By Lemma 5.8 there is an index set X C {1, . . . ,n} with |X| = 0((5~'^) such that for all y 
and any fixed j with 1 < j < n we have that UieH y^ij — Uiex(l + ^/^)yOij. Let r^ denote the 
radius of the ball for Oi from Definition 5.5, and let r = maxjrj. By Lemma 5.10, we have that, 
/ > diam(P)/(2nar). Lemma 5.8 finds a X such that for all i £ I, ri > 0, (r). A translated copy 
Oij = Oi + pj — Pi is a translation by a vector u = pj — Pi. As I > diam(P)/(2nar), there is a 
yo = 0{ln/5) such that ||pj — Pi|| < 6d\am{yoOi) /4c for all I < i,j < n, where c = 2a. Thus using 
Lemma 5.9, (1 + 5/4)yoOi + (pj - pj) C (1 + 6/4:)'^yoOi Q {1 + S)yoOi. Clearly this also holds for 
any y > yo- Thus for y > yo we have (1 + 5)yOi, covers yOi + {pj — Pi) for 1 < i < n. It is then 

easy to see that I fi i £ I> is a {S, yo)-sketch. ■ 

We conclude that for a-rounded fat objects, the scaling distance function they define falls under 
our framework. We thus get the following result. 

Restatement of Theorem 3.4pio. Consider a set O of a-rounded fat objects in IR , for some 
constant a. Then one can compute a (1 + e)-AVD for the scaling distance functions induced by O, 
with near linear space complexity, and logarithmic query time. See Theorem S.lpg and Corollary 3.2 
for the exact bounds. 

Note, that the result in Theorem 3.4 covers any symmetric convex metric. Indeed, given a 
convex symmetric shape C centered at the origin, the distance it induces for any pair of points 
p, u E K , is the scaling distance of C centered p to u (or, by symmetry, the scaling distance of 
p from C centered at u). Under this distance H is a metric space, and of course, the triangle 
inequality holds. By an appropriate scaling of space, which does not affect the norm (except for 
scaling it) we can make C fat, and now Theorem 3.4 applies. Of course. Theorem 3.4 is considerably 
more general, allowing each of the points to induce a different scaling distance function, and the 
distance induced does not have to comply with the triangle inequality. 
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5.3. Nearest furthest-neighbor 

For a set of points S C ]R and a point q, the furthest-neighbor distance of q from Q, is 
J-s(q) = maXsgs l|q~ s||; that is, it is the furthest one might have to travel from q to arrive to a 
point of S. For example, S might be the set of locations of facilities, where it is known that one of 
them is always open, and one is interested in the worst case distance a client has to travel to reach 
an open facility. The function J-'si') is known as the furthest-neighbor Voronoi diagram, and 
while its worst case combinatorial complexity is similar to the regular Voronoi diagram, it can be 
approximated using a constant size representation (in low dimensions), see [Har99]. 

Given n sets of points Pi, . . . , P„ in IR , we are interested in the distance function J-"(q) = 
miuj J-i(q), where J^i(q) = -^Pi(q)- This quantity arises natural when one tries to model uncertainty; 
indeed, let Pj be the set of possible locations of the ith point (i.e., the location of the ith point 
is chosen randomly, somehow, from the set Pj). Thus, ^j(q) is the worst case distance to the ith 
point, and J-{q) is the worst-case nearest neighbor distance to the random point-set generated by 
picking the ith point from Pj, for i = 1, . . . , n. We refer to J-{-) as the nearest furthest-neighbor 
distance, and we are interested in its approximation. 

A naive solution to this problem would maintain a data structure for computing the furthest 
neighbor approximately for each of the Pj and then just compute the minimum of those distances. 
A data-structure to compute a 1 — e approximation to the furthest neighbor takes 0{l/s'^) space 
for 0{l/e ) query time, see [Har99] although this was probably known before. Thus the entire data 
structure would take up total space of 0{n/e'^) with a query time of 0{n/e'^). By using our general 
framework we can speed up the computation. We will show that J-j , for i = 1 , . . . , n satisfy the 
conditions (PI) - (P3) and (C1)-(C3). By Theorem 3.1 we can prepare a data-structure of size 
0(npolylog (n)) that allows us a query time of O(logn) to find the desired nearest furthest-neighbor 
approximately. In order to facilitate the computations of distance/s we also maintain data structures 
for (1 — e/4)-approximate furthest neighbor search for each of the point sets Pj for i = 1, 2, . . . , n 
where e is the approximation parameter for approximating the nearest furthest-neighbor, i.e. the 
approximation parameter for the problem we are trying to solve. Also, for /i = e^/144 we also 
maintain /i-coresets for computing the minimum enclosing ball (MEB) approximately for each Pj 
for i = 1, 2, . . . , n. Each such coreset has 0(l/e^) points, see [BHI02, BC03b, BC03a]. For each i 
with 1 < z < n, the radius of the MEB of the coreset points is a (l + //)-approximation to the radius 
of the MEB of Pj. 

5.3.1. Satisfaction of the conditions 

Observation 5.12. We have that {J^i)^y = f] B(u,y), anrfdiami (J^j)^ J < 2y. 

Given the above observation, it is easy to see that Condition (PI) is true, as (Ti)^ is a finite 
intersection of compact sets. The following Lemma shows that Condition (P2) is also true, by 

letting the growth function A(jr.-)(y) = y. Since y > diamnJ^j)^ j /2 by Observation 5.12, it 

follows that we can choose the growth constant (^ to be 2. 

Lemma 5.13. For any i with 1 < i < n, if {J^i)^y ^ 0, it is true that, 

(J^,)^^eB(0,ey)C(J-,)^(,^^)^. 

Proof: Consider any point q in {Fi)^ © B(0,ey). It is easy to see that B(q, (1 + e)y) 5 Pj by the 
triangle inequality, and so q G (-7^j)^(i_|_£)y- ■ 
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Condition (P3) is implied by the following, 

Lemma 5.14. Let Q C {J^i, . . . ,-F„} denote any set of functions. Then, given any 5 > 0, there is 
a subset T-i (Z Q with \'}i\ = 1 and a yo with yo = 0{c\{G) \G\ /5) such that 7i is a {5, yo)-sketch for 

g. 

Proof: Without loss of generality, let Q = {.Fi, . . . , J>„} where m = \Q\. Let z = c\{Q). Since 
{Ti)^^ for i = 1, . . . ,m are all connected, and by Observation 5.12, for each i with 1 < i < m we 

have that diami (J^j)^^ J < 2z it follows that for any two points u G Pj,v G P^ with 1 < j,k < m 

there are points u' € (-Fj)-<^iv' G {Tk)^^, such that ||u — u'|| < z, ||v — v'|| < z (by definition of 
the function J^j and T^ respectively) and ||u' — v'|| < 2mz, by the bound on the diameter of the 
sublevel sets and the condition of being connected, which is the same as the intersection graph of 
the sets being connected. It follows by the triangle inequality that ||u — v|| < 2{m + l)z < Amz i.e. 
diam(Pi U • • • U Pm) < 4mz. Let ^ be a set containing an arbitrary function from Q say, 7i = {-^i}. 
It is not too hard to see (or one can apply Lemma 5.2 with the case Oj = 0, lUj = 1), that for every 
i = 1, . . . ,m, B(v, y) C B(u, (1 + 6)y) for any points u, v G Pi U • • • U Pm if y > yo = Arnz/5, from 
the bound on the diameter of Pi U • • • U P^- Thus, for any i with 1 < i < m we have that, 

^^i)^y= n B(v,y)CB(u,(l + %), 



vePi 



for all u G Pi and y > yo- As such, 



(-F.)^,C fl B(u,(l + %) = (J-i)^(i+,)^, 
uePi 



and the result follows. 



Remark 5.15. For Q = {-Fi, . . . ,-?>«}, notice that we can compute the above set % in 0(1) time 
and that we can compute a polynomial approximation to cl(t/) in 0{m) time, since we can compute 
the diameter diam(Pi U • • • U P^) approximately in 0{m) time - we simply take an arbitrary point in 
Pi and compute furthest distances approximately for each of Pi for 1 < i < m and take the maximum 
of these. We can use the 0{1) time query algorithm for furthest neighbor for this purpose. 

We now consider the computability conditions (CI)- (C3). To compute d(q, J^j) we use the data 
structure for approximate furthest neighbor queries to get a (1 — e/4)-approximation to this num- 
ber. We run the preprocessing algorithm, see Section 4, with approximation parameter e/4. By 
Remark 2.10, we only tile the sublevel sets iJ~i)^y with canonical cubes of size (rounded to a power 
of two) eA(jr.)(y)/4 = ey/4:. Notice that the minimum y such that {J'i)^y is non-empty is clearly the 
radius of the MEB of the point set Pj and for this value {Fi)^ just includes the center of the MEB. 
Let Uj and Zj denote the center and radius of the exact MEB, and u'j and z'j denote those computed 
by using the coreset. Since z'^ < (1 + M)zi, it not too hard to see that ||uj — u^|| < 3y^Zj = ezj/4. 
This is implied for example by Lemma A.l, presented in Appendix A, which may also be of inde- 
pendent interest (this assumes ^ < 1 which is indeed true). We are required to tile the sublevel set 
{J^i)^y for some y > z'y(l + /i) using cubes of size roughly ey/4, but we use in fact cubes of size 
roughly ey/c for some large constant c. One can consider such cubes at increasing distance from 
the point u^. Choosing any point within a cube one evaluates approximately the furthest neighbor 
distance of Pj, and checks if it is at most y(l + 0{e)). If so, one includes the cube. Since all such 
cubes will intersect the ball around u^ of radius y, or a slight expansion of it, the number of such 
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cubes is still 0{l/e'^). Now, the procedure in fact guarantees that all subcubes intersecting {Fi)^ 
are found, but in fact there may be some that do not intersect it. However, this is not a problem as 
such cells will still be inside (-^i)-<(i+£/4)^ which is what is really required. To see that this works 
should be intuitively clear. We omit the straightforward, but tedious proof. This settles Condition 
(C2). Notice that the distance/ between Ti and J-j is the radius of the minimum enclosing ball of 
the point set Pj U Pj. Using the (1 + /x)-coresets that we have for the MEB of Pj and Pj we can 
compute a (1 + 2/i)-coreset for the MEB of Pj U Pj by simply merging those coresets. This allows 
us to approximately compute the distance j. 

Remark 5.16. For the computability conditions (C1)-(C3) we only showed approximate results, 
that is the distance fS were computed approximately. In fact, to be conservative, we used e/4 as the 
approxivfiation parameter in the construction algorithm and the furthest neighbor data structure. 
As a tedious hut straightforward argument can show, the main lemmas Lemma 4-1 o,nd Lemma 4. 2 
for near neighbor and interval range queries as well as the ones for computing the connectivity and 
splitting radius Lemma 4-9 and Lem,ma 4-1^ can work under such approximate computations, with 
the same running times. 

We thus get the following result. 

Restatement of Theorem 3.5pio. Given n point sets Pi, . . . , P„ in M with a total of m points, 
and a parameter e > 0, one can preprocess the points into an AVD, of size 0{n), for the nearest 
furthest-neighbor distance defined by these point sets. One can now answer (1 + e)-approximate 
NN queries for this distance in O(logn) time. (Note, that the space and query time used, depend 
only on n, and not on the input size.) 

Proof: We only need to show how get the improved space and query time. Observe that every one 
of the sets Pj can be replaced by a subset Sj C Pj, of size 0(l/e'^log(l/e)), such that for any point 
q G H , we have that J^Sj(q) < -^Pi(q) < (l + ^/4)-7^Si(q)- Such a subset can be computed in 0(|Pi|) 
time, see [Har99]^. We thus perform this transformation for each one of the uncertain point sets 
Pi, ... , P^, which reduces the input size to 0(n/e'^log(l/e)). We now apply our main result to the 
distance functions induced by the reduced sets Si, . . . , Sn. ■ 

6. Conclusions 

In this paper, we investigated what classes of functions have minimization diagrams that can be 
approximated efficiently - where our emphasis was on distance functions. We defined a general 
framework and the requirements on the distance functions to fall under it. For this framework, 
we presented a new data-structure, with near linear space and preprocessing time. This data- 
structure can evaluate (approximately) the minimization diagram of a query point in logarithmic 
time. Surprisingly, one gets an AVD (approximate Voronoi diagram) of this complexity; that is, a 
decomposition of space with near linear complexity, such that for every region of this decomposition 
a single function serves as an ANN for all points in this region. 

We also showed some interesting classes of functions for which we get this AVD. For example, ad- 
ditive and multiplicative weighted distance functions. No previous results of this kind were known, 
and even in the plane, multiplicative Voronoi diagrams have quadratic complexity in the worst 
case (for which the AVD generated has near linear complexity for any constant dimension). The 



■^One computes an appropriate exponential grid, of size 0(l/e''log(l/e)), and pick from each grid cell one repre- 
sentative point from the points stored inside this cell. 
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framework also works for Minkowski metrics of fat convex bodies, and nearest furthest-neighbor. 
However, our main result applies to even more general distance functions. 
Several questions remain open for further research: 

(A) Are the additional polylog factors in the space necessary? In particular, it seems unlikely that 
using WSPD's directly, as done by Arya and Malamatos [AM02], should work in the most 
general settings, so reducing the logarithmic dependency seems quite interesting. Specifically, 
can the Arya and Malamatos construction [AM02] be somehow adapted to this framework, 
possibly with some additional constraints on the functions, to get a linear space construction? 

(B) On the applications side, are constant degree polynomials a good family amenable to our 
framework? Specifically, consider a polynomial t{x) that is positive for all x > 0. Given 
a point u, we associate the distance function /(q) = T(||q— u||) with u. Given a set of such 
distance functions, under which conditions, can one build an AVD for these functions efficiently? 
(It is not hard to see that in the general case this is not possible, at least under our framework.) 
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A. Bounding the size of intersection of balls of the same radius 

Lemma A.l. Let u, z be the center and radius of the MEB for a set of points P = {pi, . . . , p^} ^ 
W^. Let 5 >0 be any number. Let p e flili B(Pi, (1 + S)z). Then, 

6z< ||p-u|| < ^/a5~+25^z. 

Proof: The first inequality follows from the triangle inequality. We use the fact that there are 
affinely independent points from P, on the surface of the MEB, such that u lies in their convex 
hull. Thus, assume without loss of generality that there are points pi, . . . , pfc G P that are affinely 
independent with ||pj — u|| = z and A, > for i = 1, 2, . . . , fc such that, 

k k 

u = ^AiPj, y^Aj = 1. 

We restrict our attention only to the points pi, . . . , p^ since the region ni=i ^(pi, (1 + 6)z) contains 
the region fXlLi B(pi, (1 + 6)z). Consider an arbitrary point p S nj=i B(pi, (1 + '^)z)- Let p' be the 
projection of p to the affine subspace spanned by pi, . . . , pfc. We first bound ||p' — u||. It is easy to 
see that p' — u satisfies (p' — u, pi) < for some i with 1 < i < k. Without loss of generality assume 
i = 1. It follows that, 



||p' - Pi|| > V Up' ~ ^11^ + 11^ ~ Pill^ - V^^ + Up' ~ ^l'^- 

On the other hand it must be the case that, ||p' — pi|| < ||p — pi|| < (1 + 5)z. As such, (1 + 6)z > 
7? + Up' — u|| , and we have that ||p' — u|| < \/'lb + S^z. We also have that, 

(1 + 5)V > Up - pif = ||p - p'f + ||p' - Pill' > ||p - p'll' + z2, 

implying that ||p — p'|| < \/26-\-6^z. It follows by the Pythagorean theorem, 

Up - uf = ||p - p'll' + ||p' - n\f < 2(25 + 6'^)z^, 



and thus ||p - u|| < a/4(5 + 26'^z. m 

B. Basic properties of the functions 

Lemma B.l. Let T be a set of functions that satisfy the compactness (PI) and bounded growth 
(P2) conditions. Then, for any f £ J^, either /-<o = 9 or /-^q consists of a single point. 

Proof: If /^o contains at least two points, then by compactness (PI) of /-<o there are two points 
x,y £ f^Q such that ||x — y\\ = diam(/-^o) > 0. By the bounded growth (P2) it follows that 

/do C /^o e B f 0, fcM^ c /^o e B(0, A/(0)) c /^o, 



using e = 1 and the fact that A/(0) > diam(/^o) /C = Ik - y\\ /(■ Thus, /-<o b(o, 1^^") = f^Q. 
Clearly in y © Bl 0, "'^ '^" j there is some y' such that ||x — y'\\ > \\x — y\\ which contradicts that x 



C 
and y is a diametrical pair in f^Q. 
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By the above lemma, we may assume that a symbohc perturbation guarantees that d(/, g) > 
for f ^ g. With this convention we have the following, 

Observation B.2. If c\{Q) = for any non-empty subset Q then \Q\ = 1. 

We also assume that the quantities d(/, g) are distinct for all distinct pairs of functions. 

Lemma B.3. Let f €z Q and y > 0. Suppose u,v G /-<y. Then, uv C G-^(i+(^/2)y, where uv denotes 
the segment joining u to v. 

Proof: If u = V, the claim is obvious. Using bounded growth (P2) with e = C/2, and the in- 
equality A/(y) > d\am{f^y)/C, it follows that f^y ® B{0,d\am{f^y) /2) C /^(i+^/2)y Thus, 
ue B(0,diam(/^y)/2) C /^(i+^/2)y as weU as B(v,diam(/^y) /2) C /-<(i+^/2)y Since ||u-v|| < 
diam(/-<y) it follows that the entire segment uv is in f^(i+(/2)y ■ 

Lemma B.4. Let Ai, . . . , Am CI H 6e compact connected sets. Let uv be any segment. Suppose 
that uv n Aj 7^ for all 1 < i < k and uv C IJ^^^^ Ai. Then, the sets Ai,l < i < k, are connected. 

Proof: It is sufficient to prove the claim for Ai C uv, as the truth of the claim for compact sets 

Ai n uv implies the truth for Ai. Thus, assume Ai C uv. Suppose the claim is false. Consider the 

intersection graph of the Ai,l < i < k. This graph has at least two components by assumption. 

Let Bi, . . . ,Bi be the partition of [1, k] that define these components i.e. for each 1 < i < I, the 

sets Aj,j £ Bi are connected, and Ax H Ay = ^ ior 1 < x,y < k if x,y belong to different Bi. 

Denote by Ci = Uigb ^i ^^^ 1 < i < I. Clearly each Cj is compact. By an easy compactness 

argument, there are distinct 1 < n,i2 < ^ such that for points s G Ci^,t E Ci^, we have that 

< ||s — t|| = min ||p — q||. However, this is impossible as s, t are distinct points on 

i<x^y<i,p<^Cj:,qeCy 

uv and the segment st is therefore covered by the Ci,l < i < I. It follows that a smaller distance 
between distinct Ci must be attainable. ■ 

Lemma B.5. Suppose we are given % ^ Q ^ J- , 5 >{) and y > 0, and % is a {5,y)-sketch for Q. 
Then, c\{n) < (l + 5)(l + C/2)max(y,cl(g)). 

Proof: Assume that G = {/i, . . . , fm} and Ti = {/i, . . . , fk} where k < m. If m = 1 then k = 1 and 
we have by definition cl(^) = c\{'H) = and the result clearly holds true. If m > 1, we need to show 
that {fi)^y/, for i = 1, . . . , /c, are connected, where y' = (1 + 6){1 + C/2)^ and / = max(y,cl(^)). 
Now by definition, G~^i is a connected set. Consider any 1 < i ^ j < k. Then there is a sequence 
of distinct indices i = ii,i2, . . . ,is = j such that {fi,.)^i n (/j,.+i)_.^ 7^ for 1 < r < s — 1. 
Consider any such index say v such that if > k i.e. fi^ ^ T-L. Since, {fir)~^i H (/v_i)_.; 7^ and 
ifir):<l n {fir+i)-^i / we can choose points u G (/v_i)^, n {fi,)^i and v G (/iJ-<, n (/i,+i)_^,. 
Now the entire segment uv C (/v)-<(i+r/2)« by Lemma B.3. Since (1 + C/2)^ > y it follows by 
the sketch property (P3), that uv C ifir)~^(i^c/2)i — ^^(i+C/2)(i+5)i- -^y Lemma B.4 the sets 
in the minimal cover of uv by the sublevel sets {fi)^n^(-/2)(i+s)i^^ < i < k.^ are connected. It 
follows that {fir)-i.(i+c/2)i *^^^ b^ replaced by a sub-collection of the (/j)^(i^(»/2)(i+5)^ 1 < i < k 
and the property of neighbor intersections is still valid in the chain. We replace each occurrence 
of the set (/i,.)-<(i+c/2)i f°^ ir > k hy the corresponding chain. It is easy to see that the resulting 
chain connects up (/ii)^n^A/2)(i+5)/ and (/is)-<(i+r/2)(i+5)r Now, duplicate elements can be easily 
removed without affecting the neighbor intersection property of the chain. ■ 

The following testifies that a sketch approximates the distance/ of a set of functions. 
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Lemma B.6. Let % (IQ he sets of functions, where % is a (5, y^j-sketch for Q for some 5 > and 
yo ^ 0. Let q be a point such that d(q, G) > yo- Then we have that (d{q,'H) < (1 + (5)d(q, G). 

Proof: Let / = d(q, ^) and let / G ^ be a witness that q G f-^i. As I > yo we have that /-<; Q 
Uqe-H 5^(i+<5)i ^y ^^^ sketch property (Definition 2.9). As such there is some function g £ Ti such 
that q G g-<{i+s)l- It follows that d(q, g) < {1 + '^)d(q, Q). ■ 
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